Estimation of Processing Time of an API
Learn to estimate the processing time of an API.
In the previous lesson, we learned that the response time is a combination of latency and processing time, as given in the following equation:
Let's start by estimating the processing time of an API.
Processing time#
The processing time of a server is defined as the time a server takes to process a request to prepare a response. This is one of the important factors that affect response time. Therefore, estimating processing time is an important part of estimating the total response time of a service.
The illustration below is a high-level architecture of what constitutes processing time in an API. The server interacts with the database to execute queries for data retrieval that might also involve file handling. It includes the round trip from the API gateway to downstream services, the request execution time, and the response preparation time.
There is no rule of thumb to calculate the exact processing time. It depends on several things, like the services, the components within the services, and the technologies (both hardware and software). Usually, the processing involves analyzing a query and fetching the data from the server’s memory or corresponding database. The processing time will primarily depend on three factors that are listed below:
The type of request
The application server’s time to handle a request
Database query execution time
The processing time depends on the machine’s specification, which is processing the user’s request. There are plenty of servers available with different specifications supporting different requirements. We’ll consider a typical server from Amazon Web Services (AWS) whose specifications are defined below:
Server Specifications
Component | Specification |
Sockets | 2 |
Processor | Intel Xeon X2686 |
RAM | 240 GB |
Cores | 36 cores (72 hardware threads) |
Cache (L3) | 45 MB |
Storage | 15 TB |
Request processing estimation#
In this section, we’ll estimate the time a server takes to handle a request depending on the type of request. Mainly, there are two types of requests that are bound by either CPU or memory.
CPU bound: These are requests where the CPU acts as a limiting factor.
Memory bound: These are requests where the memory acts as a limiting factor.
Let's say that each CPU-bound request takes 200 milliseconds (ms), and each memory-bound request takes 50 ms to complete. The requests per second (RPS) for each are calculated using the following formulas.
The following terms are used in this calculation:
: The CPU bounded request per seconds : The number of hardware threads (CPU threads) : The time each task takes to complete
Similarly, for memory bound, if each worker consumes 300 MB of memory, RPS is calculated as:
The following terms are used in this calculation:
: The memory-bound request per seconds : The total size of the RAM : A worker in memory that manages a request
If we consider half the requests are memory bound and half are CPU bound, then the average RPS would be:
Considering the calculations above, the system takes approximately 0.125 ms (
Quiz
Question
Let’s consider a system having 72 cores (144 hardware threads) with 128 GB RAM. Each CPU-bound request takes 100 ms, and each memory-bound request takes 70 ms. How many requests per second (RPS) does each system handle if each worker consumes 200 MB of memory?
Requests per second handled by CPU are calculated as:
= =
Requests per second handled by memory are calculated as:
= =
Query execution time#
The latency incurred due to database queries is significant because data retrieval is a time-consuming task. Therefore, the database query execution time should be as fast as possible. Filesystem, database, system/machine-level, and distributed cache, all types of caching help greatly reduce the query execution time.
Let's see an example of how we can measure the time it takes to execute a query. We’ll use MySQL as the database type because it is widely used in the industry. Query executions are measured in the following way:
Note: The technique above is referred to as profiling. In general, an
INSERTquery takes between 0.16 to 3 ms, whereas aSELECTquery takes approximately 0.13 to 2 ms to execute on MySQL server.
This query execution time includes memory access time as well. Depending on the query, it either writes data to memory while saving it to a database or reads it from memory. This query execution time is for an optimized database with an optimized structure and relationship within the defined AWS.
Estimating processing time#
From the previous two sections, we have identified that processing time is dependent on the application server’s computation and database query handling time. However, network latency between these servers is also a key factor. In this section, we’ll learn how the location of these communicating components affects the overall processing time. We aim to define a range with minimum and maximum processing time for entertaining a simple user request. Using that as a basis, we can determine the plausibility of a practical system.
Let’s take a look at the slides below to see how we estimate the processing time of a simple user request:
1 of 6
2 of 6
3 of 6
4 of 6
5 of 6
6 of 6
We take the summation of all the latency and computation times to obtain the following processing time:
In the slides above, we estimated a computation time of 0.125 ms for a server to handle a request (derived above) and estimated 1.5 ms as the average time to handle a database query (based on the average times estimated in the previous section). We assumed 0.5 ms as the propagation time of the query assuming the servers are in the same data center. However, the processing time varies if the service components are located at different locations, which eventually affects the response time. The change in processing time for different scenarios is depicted in the following slides:
1 of 4
2 of 4
3 of 4
4 of 4
From the slides above, we can see that the processing time of a simple user request can vary greatly. In reality, user requests are complex, so much so that the API gateway can make calls to multiple services to compile a response to the request. In that case, depending on the type of query, two types of communication are possible:
Parallel communication: In modern applications, an ideal case is the API gateway communicates simultaneously with all the downstream services. Each service performs the computation in parallel to the other services and provides the results as soon as they are available. This approach saves time and is desired when feasible.
Serial communication: The other scenario is when the API gateway communicates serially with all the available services. In this case, the processing time would be the sum of all the processing times taken by individual services. Serial communication is often a requirement when one service depends on another to generate its result.
Note: In the illustration above, the API gateway processes the requests in a single step to all services in parallel processing (step 1), whereas in serial processing (steps 1, 2, and 3), the API gateway processes the requests in three steps, one after another.
Discussion#
The processing time calculated above is for rudimentary querying or storing data in a database. We performed an estimation that is based on an ideal scenario. In the practical world, several factors can affect the overall processing time of a request. Some of the factors are listed below:
The time required for a file storage service will be significantly higher because we’ll be storing the file in different locations, processing it into chunks, extracting its metadata, and storing the corresponding metadata.
The processing time may also vary depending on the operations each downstream service needs to perform to process the request. Even in parallel processing, the time service A takes to compute its result will be different from that of service B on most occasions.
In real applications, the service is provided from the nearest locations (zone and region) to minimize the response time, particularly in backups, for disaster recovery. This possible inter-zonal communication is another factor affecting the time.
Sometimes, data processing needs intensive computations, like encryption, big data analytics, encoding, and so on, which also affect the processing time.
Other latency-increasing factors may include errors, network or device failures, path or machine resolution operations like hashing, and so on.
Quiz#
Let’s test your knowledge of latency numbers with the following quiz. You’ll need to match the correct answer by clicking an option in the left column, then clicking its corresponding latency number in the right column. This is not a test of memory, just an exercise to see how you solve it.
Region-region communication takes…
…5 ms
Communication within a region takes…
…100 ms
Communication within the same datacenter takes…
…10 ms
Communication between two data centers within a region takes…
…0.5 ms
In the next lesson, we’ll estimate the latency of an API.
Introduction to Response Time in APIs
Estimation of Latency of an API